Apparatus, kits and methods for the prediction of onset of sepsis

ABSTRACT

The present invention provides kits, methods, and apparatus for analysing a biological sample from an animal to predict (pre-symptomatically) and monitor the development of sepsis, utilising biomarker signatures, and especially biomarker signatures capable of providing a mean predictive accuracy of at least 92% to differentiate development of sepsis from non-sepsis.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/117,923, filed on Aug. 10, 2016, which is a U.S. national phase ofInternational Patent Application No. PCT/GB2015/000004, filed on Jan. 9,2015, which claims the benefit of United Kingdom Patent Application No.1402293.3, filed on Feb. 11, 2014, each of which is incorporated hereinby reference in its entirety.

FIELD

The present invention is concerned with kits, methods and apparatus foranalysing a biological sample from an animal to predict and monitor thedevelopment of sepsis utilising biomarker signatures/lists of biomarkersto predict whether an animal is likely to develop the symptoms ofsepsis, and especially biomarker signatures capable of providing a meanpredictive accuracy of at least 92% to differentiate development ofsepsis from non-sepsis, and of at least 95% to differentiate developmentof sepsis from SIRS.

BACKGROUND

Following exposure to a biological agent there is often a lag phasebefore symptoms of sepsis present. After the onset of clinical symptoms,the effectiveness of treatment often decreases as the diseaseprogresses, so the time taken to make any diagnosis is critical. It islikely that a detection or diagnostic assay will be the first confirmedindicator of sepsis. The availability, rapidity and predictive accuracyof such an assay will therefore be crucial in determining the outcome.Any time saved will speed up the implementation of medicalcountermeasures and will have a significant impact on recovery.

The development of technologies to facilitate rapid detection ofbiological agent infection is a key concern for all at risk. During theinitial stages of infection many biological agents are either absentfrom, or present at very low concentrations in, typical clinical samples(e.g. blood). It is therefore likely that agent-specific assays wouldhave limited utility in detecting infection before clinical symptomsarise. Previous studies have shown that infection elicits a pattern ofimmune response involving changes in the expression of a variety ofbiomarkers that is indicative of the type of agent. Such patterns ofbiomarker expression have proven to be diagnostic for a variety ofinfectious agents. It is now possible to distinguish patterns of geneexpression in blood leukocytes from symptomatic patients with acuteinfections caused by four common human pathogens (Influenza A,Staphylococcus aureus, Streptococcus pneumoniae and Escherichia coli)using whole transcriptome analysis. More recently, researchers have beenable to reduce the number of host biomarkers required to make adiagnosis through use of appropriate bioinformatic analysis techniquesto select key biomarkers for the diagnosis of infectious disease.

While host biomarker signatures represent an attractive solution for thepre-symptomatic detection of biological agent infection, their discoveryrelies on the exploitation of laboratory models of infection whosefidelity to the pathogenesis of disease in humans varies. An alternativeapproach for pre-symptomatic biomarker discovery in humans is to exploita common sequela of biological agent infection; the life-threateningcondition sepsis. Sepsis is traditionally defined as a systemicinflammatory response syndrome (SIRS) in response to infection which,when associated with acute organ dysfunction, may ultimately causesevere life-threatening complications. This broad definition relies onobservation of overt symptoms of systemic illness (temperature, bloodpressure, heart rate, etc.) as well as the indication of the presence ofan infectious organism through microbial culture from clinical samples.It has been described in animal (primarily murine and NHP) models ofanthrax (Bacillus anthracis), tularemia (Francisella tularensis), plague(Yersinia pestis), glanders (Burkholderia mallei), melioidosis (B.pseudomallei), haemorrhagic filovirus and alphavirus infection. Moreimportantly, sepsis is directly caused by the same biological agents inhumans.

The incidence of natural biological agent infection is generallyextremely low, making prospective studies of the onset of disease in ahuman population non-viable. However, the development of severe sepsis,associated with organ dysfunction, hypoperfusion or hypotension, is amajor cause of morbidity and mortality in intensive care units (ICU). Inthe UK, severe sepsis is responsible for 27% of all ICU admissions.Across Europe the average incidence of severe sepsis in the ICU is 30%,with a mortality rate of 27%. In the USA, hospital-associated mortalityfrom sepsis ranges between 18 to 30%; an estimated 9.3% of all deathsoccurred in patients with sepsis. Clearly there is a very accessiblepatient population that could be used to study predictive markers forthe onset of sepsis.

Despite greatly improved diagnosis, treatment and support, seriousinfection and sepsis remain significant causes of death and often resultin chronic ill-health or disability in those who survive acute episodes.Although sudden, overwhelming infection is comparatively rare amongstotherwise healthy adults, it constitutes an increased risk inimmunocompromised individuals, seriously ill patients in intensive care,burns patients and young children. In a proportion of cases, anapparently treatable infection leads to the development of sepsis; adysregulated, inappropriate response to infection characterised byprogressive circulatory collapse leading to renal and respiratoryfailure, abnormalities in coagulation, profound and unresponsivehypotension and, in about 30% of cases death. The incidence of sepsis inthe population of North America is about 0.3% of the population annually(about 750,000 cases) with mortality rising to 40% in the elderly and to50% in cases of the most severe form, septic shock.

It should be noted that clinical sepsis may also result from infectionwith some viruses (for example Venezuelan Equine Encephalitis Virus,VEEV) and fungi, and that other mechanisms are likely to be involved insuch cases.

The ability to detect potentially serious infections as early aspossible and, especially, to predict the onset of sepsis in susceptibleindividuals is clearly advantageous. A considerable effort has beenexpended over many years in attempts to establish clear criteriadefining clinical entities such as shock, sepsis, septic shock, toxicshock and systemic inflammatory response syndrome (SIRS). Similarly,many attempts have been made to design robust predictive models based onmeasuring a range of clinical, chemical, biochemical, immunological andcytometric parameters and a number of scoring systems, of varyingprognostic success and sophistication, proposed.

According to the 1991 Consensus Conference of the American College ofChest Physicians (ACCP) and Society of Critical Care Medicine (SCCM)“SIRS” is considered to be present when patients have more than one ofthe following: a body temperature of greater than 38° C. or less than36° C., a heart rate of greater than 90/min, hyperventilation involvinga respiratory rate higher than 20/min or PaCO₂ lower than 32 mm Hg, awhite blood cell count of greater than 12000 cells/μl or less than 4000cells/μl.

“Sepsis” has been defined as SIRS caused by infection. It is acceptedthat SIRS can occur in the absence of infection in, for example, burns,pancreatitis and other disease states. “Infection” was defined as apathological process caused by invasion of a normally sterile tissue,fluid or body cavity by pathogenic or potentially pathogenicmicro-organisms.

“Severe sepsis” is defined as sepsis complicated by organ dysfunction.

“Septic shock” refers (in adults) to sepsis plus a state of acutecirculatory failure characterised by a persistent arterial hypotensionunexplained by other causes.

The correlation of sepsis and a number of specific serum markers hasbeen extensively studied with a view to developing specific diagnosticand prognostic tests.

However, although many of these markers correlate with sepsis and somegive an indication of the seriousness of the condition, no single markeror combination of markers has yet been shown to be a reliable diagnostictest, much less a predictor of the development of sepsis.

Extracting reliable diagnostic patterns and robust prognosticindications from changes over time in complex sets of variablesincluding traditional clinical observations, clinical chemistry,biochemical, immunological and cytometric data requires sophisticatedmethods of analysis. The use of expert systems and artificialintelligence, including neural networks, for medical diagnosticapplications has been being developed for some time.

Neural networks are non-linear functions that are capable of identifyingpatterns in complex data systems. This is achieved by using a number ofmathematical functions that make it possible for the network to identifystructure within a noisy data set. This is because data from a systemmay produce patterns based upon the relationships between the variableswithin the data. If a neural network sees sufficient examples of suchdata points during a period known as “training”, it is capable of“learning” this structure and then identifying these patterns in futuredata points or test data. In this way, neural networks are able topredict or classify future examples by modelling the patterns presentwithin the data it has seen. The performance of the network is thenassessed by its ability to correctly predict or classify test data, withhigh accuracy scores, indicating the network has successfully identifiedtrue patterns within the data. The parallel processing ability of neuralnetworks is dependent on the architecture of its processing elements,which are arranged to interact according to the model of biologicalneurones. One or more inputs are regulated by the connection weights tochange the stimulation level within the processing element. The outputof the processing element is related to its activation level and thisoutput may be non-linear or discontinuous. Training of a neural networktherefore comprises an adjustment of interconnected weights depending onthe transfer function of the elements, the details of the interconnectedstructure and the rules of learning that the system follows. Suchsystems have been applied to a number of clinical situations, includinghealth outcomes models of trauma patients.

US patent application 2002/0052557 describes a method of predicting theonset of a number of catastrophic illnesses based on the variability ofthe heart-rate of the patient. A neural network is among the possiblemethods of modelling and analysing the data.

International patent application WO 00/52472 describes a rapid assaymethod for use in small children based on the serum or neutrophilsurface levels of CD11b or ‘CD11b complex’ (Mac-1, CR3). The method usesonly a single marker, and one which is, arguably, a well-known marker ofneutrophil activation in response to inflammation.

The alternative approach to analysing such complex data sets where thedata are often qualitative and discrete, rather than quantitative andcontinuous, is to use sophisticated statistical analysis techniques suchas logistic regression. Where logistic regression using qualitativebinary dependent variables is insufficiently discriminating in terms ofselecting significant variables, multivariate techniques may be used.The outputs from both multiple logistic regression models and neuralnetworks are continuously variable quantities but the likelihoodscalculated by neural network models usually fall at one extreme or theother, with few values in the middle range. In a clinical situation thisis often helpful and can give clearer decisions.

The ability to detect the earliest signs of infection and/or sepsis hasclear benefits in terms of allowing treatment as soon as possible.Indications of the severity of the condition and likely outcome ifuntreated inform decisions about treatment options. This is relevantboth in vulnerable hospital populations, such as those in intensivecare, or who are burned or immunocompromised, and in other groups inwhich there is an increased risk of serious infection and subsequentsepsis. The use or suspected use of biological weapons in bothbattlefield and civilian settings is an example where a rapid andreliable means of testing for the earliest signs of infection inindividuals exposed would be advantageous.

However, until now neither a test nor a list of biomarkers has beenidentified/produced which can detect or predict sepsispre-symptomatically with a high predictive accuracy (for example >75%,but preferably >90%).

SUMMARY

The present invention thus aims to provide a biomarker signature (listof biomarkers), and methods for classifying biological samples using thebiomarker signature, to pre-symptomatically predict/detect thedevelopment of sepsis with a high predictive accuracy, and especially abiomarker signature that could differentiate between sepsis and SIRSwith an accuracy of at least 95%, and/or differentiate between sepsisand non-sepsis with an accuracy of at least 92%.

With this in mind, the applicants have determined a biomarker signature(list of biomarkers) predictive of the development of sepsis prior tothe onset of symptoms (pre-symptomatic) and capable of a mean predictiveaccuracy of at least 75% to differentiate development of sepsis fromnon-sepsis, and sepsis from SIRS, wherein the biomarker signaturecomprises at least 25 genes, or the products expressed by those genes,selected from the list of genes consisting of the 266 genes listed inTable 1. The Applicant has identified through a comprehensive analysisof the host transcriptome, sourced from blood samples from humanpatients collected prior to the clinical onset of sepsis, a panel of 266genes (Table 1) highly significant to the onset of symptoms of sepsis.The full panel and subsets thereof were used in a number of statisticalmodels to determine discrimination between sepsis and non-sepsispatients, and between patients with sepsis and SIRS. In order to achievea mean predictive accuracy of greater than 75%, the Applicant has shownthat a signature of at least 25 gene biomarkers can be randomly selectedfrom the 266 genes listed in Table 1.

The Applicant has in particular shown through an analysis of 44,014combinations/biomarker signatures of 44 biomarkers, randomly selectedfrom the list of 266, that all combinations have a mean predictiveaccuracy of greater than 75%. These results are illustrated by the 15specific combinations listed in Table 24, which have the accuraciesshown in FIG. 3. Thus in one embodiment the biomarker signaturecomprises at least 44 genes selected from the list of genes consistingof the 266 genes listed in Table 1.

The Applicant has also identified biomarker signatures, comprising atleast 25, at least 44, and comprising all 266 gene biomarkers, which iscapable of differentiating development of sepsis from non-sepsis with amean predictive accuracy of at least 92%, and development of sepsis fromSIRS with a mean predictive accuracy of at least 95%.

The Applicant has produced and trained an artificial neural network(ANN) which can provide a predictive accuracy for any selection ofbiomarkers from the 266 to differentiate between sepsis and non-sepsisand/or sepsis from SIRS predict, and thereby provide a likelihood ofwhether a patient is to develop sepsis or not through inputting thepatient data set into the ANN.

A patient data set, for example that comprising gene expression levelsfor the 266 biomarkers in a patient blood sample, is inputted into theANN, having selected a biomarker signature (list of biomarkers), andwill thereby output the predictive accuracy of the selected biomarkersignature, and also indicate whether the specific patient data set isindicative of the development of sepsis, versus non-sepsis and/or SIRS.The R script for the trained ANN is detailed in Table 2.

The Applicant has shown that a biomarker signature (list of biomarkers)comprising at least 25 genes, but preferably about 44 genes, or theproducts expressed by those genes, selected from the list of genesconsisting of the 266 genes listed in Table 1, as inputted into amathematical model such as the ANN detailed in Table 2, can bepredictive of the development of sepsis prior to the onset of symptoms(pre-symptomatic) and be capable of a mean predictive accuracy of atleast 92% to differentiate development of sepsis from non-sepsis.

The Applicant has also shown that a biomarker signature (list ofbiomarkers) comprising at least 25 genes, but preferably about 44 genes,or the products expressed by those genes, selected from the list ofgenes consisting of the 266 genes listed in Table 1, as inputted into amathematical model such as the ANN detailed in Table 2 can be predictiveof the development of sepsis prior to the onset of symptoms(pre-symptomatic) and capable of a mean predictive accuracy of at least95% to differentiate development of sepsis from SIRS. Biomarkersignatures providing such high predictive accuracies have not until nowbeen identified, and clearly the use of such signatures could greatlyimprove the power of kits, apparatuses and methods to be able toidentify patients likely to develop sepsis, i.e. presymptomatically, andalso to monitor patients with sepsis, and potentially inform patienttreatment.

TABLE 1 The 266 gene biomarkers predictive of pre-symptomaticdevelopment of sepsis, as down- selected from the whole transcriptomeusing a multitude of mathematical methods. ACTR6 EBI2 CXORF42 SORBS3RPL11 SLC26A8 ATP2A2 BIN1 GAS7 CLASP1 TIMM9 PPP2R2B WDR37 ZNF608 C16ORF7HIST2H4B CD2 TST NOL11 ZNF17 TBC1D8 CD247 IL1R1 C14ORF112 CCDC65 GZMKANKS1A RRBP1 CLNS1A LGALS2 BCL6 NCOA3 ZNF32 CD59 RPL26 CYB561 LTA MRPL24PDCD4 TMEM42 EIF3D PHCA FCER1A EEF1B2 LOC646483 RASGRP1 TCEA3 GYG1 NSUN7GRB10 CTSS KLRG1 RPL18A SLC2A11 KIF1B LETMD1 HS.445036 CD7 HLA-DRA RPS14SERTAD2 MMP9 IRAK3 LARP5 CACNA1E GRAMD4 RPS6 RPS20 PAG1 FAM160A2LOC646766 C12ORF57 MRPS6 SIVA RPL38 RPL19 CTDP1 MRPL50 AOC2 OLFML2BSS18L2 RPL12 RPS15 ATP8B4 ADRB2 LY6E PTPRCAP TMC6 PRKCQ SLC36A1 RPS3ABOAT LOC285176 RPL13 TTLL3 OLFM1 WWP1 TDRD9 C21ORF7 IL1R2 RPL7A CDO1HLA-DRB3 ARG1 RUNX1 CD3D HLA-DMA RPS27 RPSA ZNF430 CKAP4 RPL27A CPA3GBP1 SH2D1A RPS15A TOMM7 EMILIN2 PHTF1 DHRS3 EOMES SMAD2 RPL30 TCTN1HIBADH NT5DC2 FLT3LG CUTL1 THBS3 RCN2 SLC38A10 MUC1 LOC153561 GTPBP8CD96 TP53BP2 PECI ACVR1B PFKFB2 ITGAM ICAM2 CCL5 ZNHIT3 NDST2 C13ORF23RPL22 FBXO34 LDHA C12ORF62 LEPROTL1 EFCBP1 DACH1 RPS25 CYP1B1 LOC652071ASNSD1 MS4A4A ZFAND1 FBXW2 SLC41A3 ATXN7L3 MRPS27 MAFG P117 TMEM150ITGAX ZC3H3 TRPM2 AKR1B1 LOC644096 PYHIN1 SSBP2 LOC647099 NAPB RPL4BTBD11 IL32 RPL13A SLBP OPLAH LARP4B PLAC8 C5ORF39 HLA-DMB RPL9 RTP4PTPN1 HIPK2 CD3E GBP4 RPS29 RPS17 RPL5 EXOC7 CR1 EXOSC5 SIGIRR RPL32SIL1 CMTM4 DIP2A CXORF20 SMPDL3A RPL10A UPP1 ARID5B GALM CDKN2AIP THNSL1POP5 TFB1M ZDHHC19 HDC CD177 TRAT1 NMT2 AMD1 SORT1 ICOS C12ORF65OSTALPHA FAM26F C22ORF9 RPS8 LDOC1 ATP9A MYBPC3 ZNF195 DNAJC5 RPL24 LSG1METTL7B P2RY5 TMEM204 GOLGA1 PGD AMPH LOC646200 RARRES3 TBCC KIAA1881NLRC4 C11ORF1 ITM2A RPL18 SLC26A6 MACF1 LDLR C9ORF103 HLA-DPA1 RPS10SELM P4HB HK3 CD6 GPR107 RPS5 RPS18 RPL15 EXT1 CRIP2 FAM69A SIRPG RPL36RPS13 CSGALNACT2

TABLE 2 The R script for a trained artificial neural network (ANN) forcalculating the predictive accuracy for a biomarker signature selectedfrom the 266 biomarkers to differentiate development of sepsis versusnon-sepsis and/or SIRS, and thereby indicate the likelihood that apatient data set inputted into the ANN is indicative of the developmentof sepsis or not. # DATA PROCESSING: rawdata <− read.csv(“Data/44 topperforming genes.csv”) transposed <− data.frame(t(rawdata[,−1 ]))names(transposed) <− c(“Diagnosis”, “Day”,as.character(rawdata$SAMPLE_ID[3:nrow(rawdata)])) transposed$Diagnosis<− factor(transposed$Diagnosis, levels=c(0,1), labels=c(“No Sepsis”,“Sepsis”)) for.normalising <− transposed[ ,3:ncol(transposed)]not.for.normalising <− transposed[ ,1:2] medians <−apply(for.normalising, 2, median) normalised.genes <−sweep(data.matrix(for.normalising), 2, medians) normalised.data <−data.frame(not.for.normalising, normalised.genes) input <−normalised.data[ ,−2] # TRAINING/TEST SPLIT: cases <− nrow(input)cases.train <− sample(1:cases, round((0.7*cases), digits =0)) training<− input[cases.train, ] test <− input[−cases.train, ] # NEURAL NETWORK:library(nnet) nntraining <− nnet(Diagnosis ~ ., data = training, size =1, rang = 1, decay = 0.01, maxit = 1000, Hess = FALSE, MaxNWts = 1000,abstol = 1.0e−4, reltol = 1.0e−8, trace = TRUE, skip = FALSE, lineout =FALSE, softmax = FALSE, censored = FALSE, entropy = TRUE) #Unused nnetarguments: weights = 1, Wts = 1, mask = all, entropy = FALSE Outcome <−test$Diagnosis nn_Prediction <− predict(nntraining, test, type =“class”) dfAll <− data.frame(Outcome, nn_Prediction) prediction.table <−xtabs(~Outcome+nn_Prediction, data=dfAll) c(prediction.table[1,1] +prediction.table[2,2] , prediction.table[1,2],prediction.table[2,1])/nrow(test)

Preferred biomarker signatures for use in the present invention arethose that result in a mean predictive accuracy of at least 92% todifferentiate development of sepsis from non-sepsis, or a meanpredictive accuracy of at least 95% to differentiate development ofsepsis from SIRS which can be identified by a simple iterative approach,inputting biomarker signatures into a mathematical model, such as thetrained ANN detailed in table 2. The Applicant has in particular usedthis approach to identify a key biomarker signature of 44 biomarkerswhich can differentiate sepsis from SIRS with 100% predictive accuracy,and sepsis from SIRS with 97% predictive accuracy

Accordingly, in a first aspect, the present invention provides adiagnostic kit for predicting the development of sepsis prior to theonset of symptoms (pre-symptomatic), said kit comprising means fordetecting levels of a gene or gene product of each member of a biomarkersignature in a sample, wherein the biomarker signature comprises atleast 25 genes, or the products expressed by those genes, selected fromthe list of genes consisting of the 266 genes listed in Table 1.

The biomarker signature may be capable of a mean predictive accuracy ofat least 75% to differentiate development of sepsis from non-sepsis, andsepsis from SIRS, though particularly advantageously the biomarkersignature is capable of a mean predictive accuracy of at least 92% todifferentiate development of sepsis from non-sepsis, and/or a meanpredictive accuracy of at least 95% to differentiate development ofsepsis from non-sepsis.

Microarray technology was used to obtain gene expression data of samplesderived from pre-symptomatic sepsis patients and control non-sepsispatient samples. An unsupervised bioinformatic approach was used toidentify prognostic transcriptomic expression patterns that characterizesepsis before the onset of clinical symptoms. These characteristicbiomarker patterns were further analysed and validated usingquantitative RT-PCR.

The Applicant has shown that use of all 266 biomarkers provides apredictive accuracy of more than 95% to differentiate both thedevelopment of sepsis from non-sepsis and sepsis from SIRS. A selectionof 44 biomarkers from the 266 can potentially provide a predictiveaccuracy up to 100% to differentiate the development of sepsis fromSIRS, and a predictive accuracy of at least 97% to differentiate thedevelopment of sepsis from non-sepsis.

The Applicant has in particular identified a biomarker signaturecontaining 44 biomarkers, the list consisting of those biomarkers inTable 3, which when all 44 biomarkers are used for the prediction iscapable of up to 100% predictivity of sepsis versus SIRS. Use of aspecific list of 25 biomarkers down-selected from these 45, as listed inTable 3, is capable of a predictive accuracy of at least 92% todifferentiate development of sepsis from non-sepsis, and at least 95% todifferentiate development of sepsis from SIRS. These predictiveaccuracies are in particular obtainable using the artificial neuralnetwork detailed in Table 2, though such accuracies may be obtainedusing other mathematical models, and other artificial neural networks.

TABLE 3 Specific (first) biomarker signature consisting of 44 biomarkersselected from the 266 gene biomarkers, and a further down-selected listof 25 biomarkers. 44 Gene Biomarker Signature Down-selected 25 GeneBiomarker Signature ACTR6, BIN1, C16ORF7, CD247, CLNS1A, ACTR6, BIN1,C16ORF7, CD247, CLNS1A, CYB561, FCER1A, GRB10, HS.445036, LARP5, CYB561,FCER1A, GRB10, HS.445036, LARP5, LOC646766, MRPL50, ADRB2, BOAT, C21ORF7LOC646766, MRPL50, ADRB2, BOAT, C21ORF7 CD3D, CPA3, DHRS3, FLT3LG,GTPBP8, ICAM2, CD3D, CPA3, DHRS3, FLT3LG, GTPBP8, ICAM2, LDHA,LOC652071, MRPS27, AKR1B1, BTBD11, LDHA, LOC652071, MRPS27, AKR1B1C5ORF39, CD3E, CR1, DIP2A, GALM, HDC, ICOS, LDOC1, LSG1, AMPH, C11ORF1,C9ORF103, CD6, CRIP2, EBI2, GAS7, HIST2H4B, IL1R1

A further list of 45 gene biomarkers selected from the list of 266 asdetailed in Table 4, was also shown to have a predictivity of higherthan 92% to differentiate sepsis from non-sepsis, especially with aspecific down-selected list of 25 biomarkers.

TABLE 4 Further (second) specific biomarker signature consisting of 45biomarkers selected from the 266 gene biomarkers, and a furtherdown-selected list of 25 biomarkers. 45 Gene Biomarker SignatureDown-selected 25 Gene Biomarker Signature ATP9A, C16ORF7, C5ORF39,C9ORF103, C16ORF7, C5ORF39, C9ORF103, CD177, CACNA1E, CD177, DHRS3,EEF1B2, FCER1A, FCER1A, GAS7, LOC285176, MYBPC3, NDST2, FLT3LG, GAS7,GRB10, HLA.DMA, HS.445036, EBI2, RPL13A, RPL18A, RPL32, RPL36, RPL9,IL1R1, IL1R2, LOC285176, MYBPC3, NCOA3, RPS20, RPS29, RPS6, SIGIRR,TCEA3, TCTN1, NDST2, RPL10A, EBI2, LOC646483, RPL13A, TIMM9, TOMM7,ZFAND1, ZNHIT3 RPL18, RPL18A, RPL32, RPL36, RPL9, RPS20, RPS29, RPS6,SIGIRR, SLBP, SLC26A6, SMPDL3A, SORBS3, TCEA3, TCTN1, THBS3, THNSL1,TIMM9, TOMM7, ZFAND1, ZNHIT3

These further (second) two biomarker signatures of 45 and 25 have 11 and6 biomarkers, respectively, in common with the first 44 gene biomarkersignature. The Applicant has also evaluated 14 further combinations of44 biomarkers in detail, of which all combinations have a meanpredictive accuracy of at least 75%, but of which 6 combinations have amean predictive accuracy of at least 92%. These signatures are listed inTable 5. These six signatures have at least 5 genes in common with thefirst 44 gene biomarker signature in Table 3, and thus in one embodimentany combination of 44 biomarkers or 25 biomarkers selected from the 266may comprise at least 5 biomarkers from the first 44 in order to providea mean predictive accuracy of at least 92%.

In another embodiment, the at least 25 genes comprises at least 11 genesselected from the first 44 gene biomarker signature. In a thirdembodiment the at least 25 genes comprises at least the complete first25 gene biomarker signature (listed in Table 3). In a fourth embodiment,the biomarker signature of the present invention comprises the completefirst 44 gene biomarker signature (listed in Table 3).

TABLE 5 Six combinations of 44 biomarkers selected from the list of 266biomarkers which have a mean predictive accuracy of sepsis vs non-sepsisof at least 92% by using the artificial neural network detailed in Table2 1 2 3 4 5 6 CYB561 ACTR6 C16ORF7 CD6 BCL6 EBI2 GRB10 BIN1 LARP5 CD247CLNS1A CD247 BTBD11 LOC646766 C21ORF7 CLNS1A CYB561 FCER1A CD3E ICAM2GTPBP8 C5ORF39 FCER1A ICAM2 EBI2 LOC652071 LDHA GALM C21ORF7 C5ORF39 CD7ICOS MRPS27 ICOS FLT3LG EEF1B2 LOC285176 CD7 BTBD11 AOC2 CTSS C12ORF57HLA-DMA IL1R2 HDC IL1R2 CD96 AOC2 C12ORF62 ASNSD1 CRIP2 CUTL1 CCL5 EOMESASNSD1 MAFG IL1R1 CDKN2AIP HLA-DMB IL32 GPR107 GBP4 CACNA1E ITM2ACDKN2AIP GBP4 BCL6 CXORF20 LOC285176 CLASP1 GPR107 CD177 MRPL24 HLA-DPA1HLA-DMA C14ORF112 CXORF42 HLA-DPA1 RPL7A CXORF42 ASNSD1 BCL6 CLASP1C14ORF112 RPL13A MRPL24 LOC644096 LOC646483 RPS27 BCL6 RPS5 PTPRCAP IL32RPS27 SMAD2 MRPL24 CCDC65 RPL7A FAM69A P117 ZNHIT3 RPS27 NCOA3 PYHIN1MRPL24 RPL9 RPL13A P117 RASGRP1 RASGRP1 HLA-DRA RPS10 SMPDL3A NCOA3 RPS6RPL30 MRPS6 SORBS3 TMEM150 RASGRP1 NMT2 EFCBP1 RPS27 TST FAM26F RPS6ZNF32 TMEM150 PYHIN1 RPL18A TBCC PECI SERTAD2 RPL32 SIGIRR SS18L2 TCEA3EFCBP1 RPL38 ZNF195 SMPDL3A CDO1 ITGAX TMEM150 SLC38A10 TMEM204 P2RY5RPS15A PTPN1 NMT2 ACVR1B SELM RARRES3 TMEM150 TFB1M SLC26A6 P4HB RPS18TTLL3 RPL32 AMD1 RPL11 SLC26A8 PPP2R2B RPS15A SLC26A6 KIAA1881 PPP2R2BWDR37 OLFM1 RPL36 RPS20 CD59 ZNF32 PAG1 TCTN1 TMEM42 HLA-DRB3 KIF1BACVR1B RPL19 DACH1 HLA-DRB3 TCTN1 RPL19 TFB1M SLC41A3 ITGAX FBXW2 P4HBNAPB P4HB LARP4B TFB1M LOC647099 RPL15 ZDHHC19 RPL15 ZDHHC19 GYG1 ZNF17RPS13 EXT1 ZNF17 SORT1 MMP9 CD59 WDR37 ZNF608 EIF3D NLRC4 PAG1 KIF1BANKS1A TBC1D8 MMP9 EXT1 RPS15 SLC36A1 KIF1B RRBP1 SLC36A1 ATP2A2 CKAP4PFKFB2 MMP9 ATP8B4 NAPB ZNF608 RPL22 SLC41A3 EXOC7 RPS3A ARID5B RRBP1ZDHHC19 EXOC7 CMTM4 RPL27A HK3 TDRD9 SORT1 HK3 RPL24 PHTF1 CSGALNACT2RUNX1 NLRC4 ATP2A2 CSGALNACT2 FBXO34 FAM160A2 LOC153561 CSGALNACT2LETMD1 ATP2A2 CYP1B1 RPS3A ITGAM RRBP1 ITGAM FAM160A2 RPL4 RPL27A

In a second aspect, the present invention provides a method for analysisof a biological sample from an animal to predict and monitor thedevelopment of sepsis, especially prior to onset of symptoms, comprisingmonitoring, measuring and/or detecting the expression of all biomarkersin the selected biomarker signature (list of biomarkers), andevaluating/assessing data produced from the monitoring, measuring and/ordetecting to predict and monitor the development of sepsis.

The method is preferably capable of differentiating sepsis fromnon-sepsis, with high levels of accuracy, such as >75%, butpreferably >90% accuracy, or as high as >92%, and also potentiallysepsis from SIRS with the same predictivities.

The animal may be a human, and the biological sample is most likely ablood or serum sample.

The diagnostic kit of the invention provides the means for detectinglevels of a gene or gene product of the genes comprising the biomarkersignatures described above. Although gene expression may be determinedby detecting the presence of gene products including proteins andpeptides, such processes may be complex. In a particular embodiment, themeans comprises means for detecting a nucleic acid and in particularDNA, or a gene product which is RNA such as mRNA.

The monitoring, measuring or detecting may use any suitable technique,including use of recognition elements, or microarray based methods. Thusin a particular embodiment, the kit of the invention comprisesmicroarray on which are immobilised probes suitable for binding to RNAexpressed by each gene of the biomarker signature.

In an alternative embodiment, the kit comprises at least some of thereagents suitable for carrying out amplification of genes or regionsthereof, of the biomarker signature.

In one embodiment the monitoring, measuring or detecting the expressionof biomarkers uses real-time (RT) polymerase chain reaction (PCR). Insuch cases, the means may comprise primers for amplification of saidgenes or regions thereof. The kits may further comprise labels inparticular fluorescent labels and/or oligonucleotide probes to allow thePCR to be monitored in real-time using any of the known assays, such asTaqMan, LUX, etc. The kits may also contain reagents such as buffers,enzymes, salts such as MgCl etc. required for carrying out a nucleicacid amplification reaction.

The method of the second aspect is advantageously computer-implementedto handle the complexity in monitoring and analysis of the numerousbiomarkers, and their respective relationships to each other. Such acomputer implemented invention could enable a yes/no answer as towhether sepsis is likely to develop, or at least provide an indicationof how likely the development of sepsis is.

The method preferably uses mathematical modelling tools and/oralgorithms to monitor and assess expression of the biomarkers bothqualitatively and quantitatively. The tools could in particular includesupport vector machine (SVM) algorithms, decision trees, random forests,artificial neural networks, quadratic discriminant analysis, and Bayesclassifiers. In a preferred embodiment the data from monitoring allbiomarkers in the biomarker signature is assessed by means of anartificial neural network, for example the trained artificial neuralnetwork detailed in Table 2.

In one embodiment of the second aspect the method is acomputer-implemented method wherein the monitoring, measuring and/ordetecting comprises producing quantitative, and optionally qualitative,data for all biomarkers, inputting said data into an analytical processon the computer, using at least one mathematical method, that comparesthe data with reference data, and producing an output from theanalytical process which provides a prediction for the likelihood ofdeveloping sepsis, or enables monitoring of the sepsis condition. Thereference data may include data from healthy subjects, subjectsdiagnosed with sepsis, and subjects with SIRS, but no infection.

The output from the analytical process may enable the time to onset ofsymptoms to be predicted, such as 1, 2, or 3 days prior to onset ofsymptoms, and consequently may be particularly valuable and useful to amedical practitioner in suggesting a course of treatment, especiallywhen the choice of course of treatment is dependent on the progressionof the disease. The method may also enable monitoring of the success ofany treatment, assessing whether the likelihood of onset of symptomsdecreases over the course of treatment.

In a third aspect, the present invention provides an apparatus foranalysis of a biological sample from an animal to predict and monitorthe development of sepsis comprising means for monitoring, measuring ordetecting the expression of all biomarkers in the biomarker signature asdescribed above, such as RT-PCR using reagents specific to thebiomarkers in the biomarker signature, and means for analysis of dataproduced from the means for monitoring, measuring or detecting, such asa computer comprising an appropriate mathematical model to analyses thedata, such as an artificial neural network, and means for providing anoutput from the analysis which output provides a prediction of thelikelihood of an animal having sepsis, or an output to enable monitoringof sepsis, which output could also be provided by an appropriatelyprogrammed computer.

The present invention will now be described with reference to thefollowing non-limiting examples and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a display of Bioanalyzer results for a randomised selection ofRNA sample preparations.

FIG. 2 is an illustration depicting the rationale for sample selection,and especially the selection of control samples, and the matching withsepsis patient samples.

FIG. 3 is a graph detailing the predictive accuracies for sepsis versusnon-sepsis of the 15 combinations detailed in Table 24.

FIG. 4 is a graph and table indicating the predictive accuracies fordifferent subsets of biomarkers selected from the 266 biomarkers inTable 1.

EXAMPLE Development of a Predictive Panel of Pre-Symptomatic Biomarkersfor Sepsis

The aim of this program of work was to develop a predictive panel ofpre-symptomatic biomarkers for sepsis, through comprehensive analysis ofthe host transcriptome, sourced from blood samples from human patientscollected prior to the clinical onset of sepsis, and to developbiomarker signatures that may indicate whether and when clinicalsymptoms will arise following infection. In so doing it would yield asuitably powered bioinformatic model for differentiating sepsis patientsfrom control patients based on transcriptomic biomarker signatures. Inturn, this will assist in the development of RT-PCR methods for sepsisprediction, where this capability should provide timely diagnosis andtreatment of infection when medical countermeasures are most effective.

We used microarray technology to obtain gene expression data of samplesderived from pre-symptomatic sepsis patients and control non-sepsispatient samples. An unsupervised bioinformatic approach was used toidentify prognostic transcriptomic expression patterns that characterizesepsis before the onset of clinical symptoms. These characteristicbiomarker patterns were further analysed and validated usingquantitative RT-PCR on the Fluidigm BioMark™ real-time PCR arrayplatform.

Through significance testing a final panel of 266 biomarkers wasderived. The full panel and subsets of this was then used in a number ofstatistical models to determine discrimination between sepsis andnon-sepsis patients. The artificial neural network gave the highestpredictive accuracy, with 44 biomarkers being the optimal subset.

Technical Summary

Acquisition and Storage of Patient Samples—

Patients were admitted to the study if they gave informed consent, werebetween 18 and 80 years of age and undergoing a procedure that, in theclinician's opinion, had a risk of causing infection and ultimatelysepsis. Typically these were abdominal and thoracic surgeries. However,other surgical procedures were permitted and included, with oneextensive maxillofacial procedure resulting in sepsis in one case.Patients were excluded if they were either pregnant, infected with aknown pathogen (HIV, Hepatitis A, B or C), immunosuppressed or withdrewconsent to take part in the study at any time during their stay. Allpatients received the normal standard of care once enrolled.

Blood samples were collected according to a protocol. Briefly, two 4 mlaliquots of patient blood were collected into sterile EDTA vacutainersand then immediately transferred into RNAse-free vials containing 10.5ml of RNAlater® (a RNA stabilization media) (Life Technologies, USA).These were then stored at −20° C. and eventually transported on dry ice.In addition 4 ml of patient blood was collected into a serum separationtube, spun, separated and stored at −20 C. Blood collection occurredonce between 1 and 7 days before surgery and then once daily on each daypost-surgery. Post-operative blood collection was stopped after thepatient was discharged from hospital, or after 7 days post-surgery, oronce sepsis had been confirmed by the clinician. Additional patientinformation (e.g. daily patient metrics, type of surgery andmicrobiology results) was captured using a bespoke database provided byItemTracker, UK.

We recruited 2273 elective surgery patients into the study with 1842patient time courses in storage; 72 of these patients went on to developsepsis. The incidence of sepsis in our patient cohort is therefore3.91%. Over 600 of the remaining patients met the criteria set for SIRS(2 out of the following four symptoms: increased/decreased temperature;increased heart rate; increased ventilation rate, increased/decreasedwhite blood cell count). However, many of these “SIRS” patients had verytransient changes in symptomology. We suspect that the 438 patients, asidentified by the clinical staff at the centres, are more reflective ofthe number of patients with prolonged SIRS.

This patient recruitment was sufficient to satisfy the requirement for30 sepsis patient time courses (plus matched non-sepsis patientcontrols) to be used for biomarker discovery during 2011 as well as afurther 40 sepsis patients time courses (plus matched non-sepsis patientcontrols) for the validation of biomarkers during 2012.

An initial batch of 61 SIRS patient blood samples was analysed. Of thesesamples, 2 were identified as having microbial DNA present in the blood(one patient had E. coli and the other had S. aureus). These patientswere re-classified as belonging to the patient cohort that goes on todevelop sepsis. The remaining 59 patients had undetectable levels ofmicrobial DNA present in their blood. This indicated that these patientstruly belonged to the SIRS patient group. The biomarker signatures fromboth groups of patients were then used in a biomarker discovery analysisthat provided a biomarker signature for the pre-symptomatic diagnosis ofsepsis in elective surgery patients. A second batch of 190 patientsamples containing samples from patients who developed either SIRS orsepsis, as well as samples from patients who did not develop anypost-surgical symptoms (post-operative controls) were again sent foranalysis using the Sepsitest. All post-operative control patient sampleswere confirmed as negative by the Sepsitest. Additionally all thepatient samples isolated from sepsis patients with blood borneinfections were also identified correctly. All of the SIRS patients wereconfirmed as not septic.

RNA Extraction from Stabilization Media—

The RNA from all patient samples selected for further microarray andFluidigm array analysis was extracted using the RiboPure™—Blood kit(Life Technologies, USA), followed by treatment with TURBO DNA-free™(Life Technologies, USA). In order to give confidence in the quality ofsample preparation the quality of all RNA products were assessed on theAgilent 2100 BioAnalyser (Agilent USA) using the Agilent BioAnalyser RNA6000 Nano kit (Agilent USA). Having regard to FIG. 1 a qualitativeindication of the 100 s of RNA samples using 12 randomly selectedsamples is shown using the Agilent 2100 BioAnalyser (Agilent USA). Thedouble banding in each lane indicates good quality RNA with littledegradation. Further quantitative measures of the quality and quantityof RNA preparation, like the RNA integrity number (RIN), andconcentration of RNA in each preparation indicated that RNA isolationprotocols were fit for purpose (Table 6).

TABLE 6 Quantification and integrity of typical RNA samples. Results Didthe sample pass QC Patient RIN Concentration Total (RIN >7.0/ sampleResult (μg/ml) concentration RNA >2.0)? 1 8.0 49 4.41 Yes 2 7.0 23 2.7Yes 3 7.0 36 3.24 Yes 4 7.5 34 3.6 Yes 5 8.5 28 2.52 Yes 6 8.9 30 2.70Yes 7 8.4 50 4.50 Yes 8 7.3 30 2.70 Yes 9 7.3 45 4.5 Yes 10 7.9 48 4.32Yes 11 7.5 38 3.42 Yes 12 7.8 47 4.23 Yes

Over 99% of RNA samples achieved a RIN of 7 or above with a yield of 2μg or above. This was sufficient quality and quantity to undertakemicroarray and quantitative RT-PCR analyses on these samples. On therare occasions when the sample preparations gave an unsatisfactoryyield, the process was repeated four times and the product sent forquantitative RT-PCR only (i.e. there was sufficient RNA to produce cDNAand subsequently undertake PCR).

The selection of those patients who went on to develop sepsis and thosethat did not was the responsibility of the Principal Investigators (PIs)at each centre. They were all consultant intensive care clinicians withmany years' experience in the clinic with over 265 peer reviewpublications between them. Two of the four PIs from the four centreshold prominent advisory roles to journals and funding bodies acrossEurope and the USA. Selections by the clinicians were double-checked bythe project team to ensure that all patients met the previously agreedcriteria for the definition of sepsis. Peri-operative antibiotic use wasminimal, with only one dose of a broad-spectrum antibiotic given in85.7% of sepsis patient cases, prior to sepsis diagnosis. The remainingpatients received daily doses of antibiotic but still developed clinicalevidence of sepsis. Under clinical guidance we have included thesepatients in the study as they developed sepsis in spite of treatment,although it is possible that such treatment may have influencedmicrobial culture results. The range of infectious agents that resultedin sepsis in the study was quite broad and is listed in Table 7.

TABLE 7 Infectious agents isolated from sepsis patients in phase I andII of the study. Phase 1 - Discovery Phase II - Validation Escherichiacoli Blood Serratia Haemophilus influenzae Enterobacter speciesPseudomonas aeruginosa Candida species Escherichia coli Stenotrophomonasmaltophilia Klebsiella species Proteus species Klebsiella species Gramnegative bacilli coliforms Clostridium difficile Pseudomonas aeruginosaCandida species Streptococcus pneumoniae Streptococcus pneumoniaecoliforms Staphylococcus aureus Staphylococcus aureus Moraxellacatarrhalis Unidentified Gram Streptococcus species Coagulase-negativenegative bacteria Staphylococcus (CNS) Stenotrophomonas maltophiliaEnterococcus species CDT

Once patients were confirmed as septic, a comparator group was selectedthat matched each sepsis patient's age, sex and procedure. Thesepatients did not develop SIRS as a result of their surgery. Havingregard to FIG. 2, the rationale for comparator selection is illustratedas well as which patient samples were analysed and how the time framesfor patient samples that are taken at different days post-surgery werestandardized. It should be noted that the main analytical effort wasfocused on the 3 days prior to the diagnosis of sepsis as these are mostlikely to yield useful pre-symptomatic biomarker signatures. The timecourse of the development of sepsis in a patient is indicated by theSepsis patient #1 bar. From the large number of patients who do not goon to develop sepsis following surgery a suitable age/sex/procedurematched control is identified and used as a comparator. In this examplethe day of diagnosis of sepsis is day 7 post-infection. Therefore the 3days before sepsis diagnosis are days 4, 5 and 6 post-surgery. In termsof pre-symptomatic diagnosis this may also be noted as Days −3, −2 and−1. In order to provide a robust and relevant post-operative comparisonfor each of the 3 days before sepsis diagnosis, the equivalentpost-operative blood sample was used. In this case the blood samplestaken from days 4, 5 and 6 post surgery were used for comparison, actingas Day −3, −2 and −1 controls. The process of matching thepre-symptomatic blood samples of patients who went on to develop sepsiswith their most appropriate post-operative comparators was then repeatedin Phase I and II of the study so that the time courses of 30 and 40patients who go on to develop sepsis were compared to 30 and 40post-operative comparator patients, respectively.

In addition to the non-sepsis comparator group, further controls wereprovided through exploitation of each patient's pre-operative sample aswell as samples from patients that developed SIRS and not sepsis. Thisensured that any changes observed in the transcriptomes of sepsispatients were a direct result of infection acquired during surgery. Asummary of patients used in both phase I and II of the study is given intable 8. It should be noted that antibiotic use was dictated on acase-by case basis and under the discretion of the clinician. The studyprotocol did not influence patient management; ethically we were unableto dictate medical countermeasure use during this study.

TABLE 8 Summary of patient ages, gender, delay for sepsis and types ofsurgery used in phase I and II Phase I Phase II (Discovery) (Validation)Sepsis Controls Sepsis Controls n = 30 n = 30 n −= 40 n = 40 Age 63[48-81] 61 [52-79] 64 [28-79] 64 [24-80] Gender 14/16 14/16 11/29 11/29(female/ male) Delay for 3.5 [1-8] NA 4.75 [1-9] NA sepsis SurgeryThoracic or Thoracic or Thoracic, Thoracic, Type abdominal abdominalabdominal or abdominal or maxillofacial maxillofacial

Microarray Analysis (Phase I Biomarker Discovery)—

Illumina® Human HT12v4 Beadarrays were run on the samples from the 60phase I patients (30× sepsis & 30× comparator), 80 phase II patients(40× sepsis & 40× comparator) and 40 Phase II SIRS patients. Thiscorresponded to 192 transcriptomes analysed during Phase I and 433transcriptomes analysed during phase II of the study. Data werecollected for 30 sepsis patients and 30 age, sex and surgery matchedcontrols (or baselines). Microarray data were collected from 192 bloodsamples. These represented 4 different time points corresponding topre-operation and 1, 2 and 3 days prior to the onset of sepsis. Sampleswere taken for each paired baseliner based on the corresponding day ofonset for the sepsis sample, summarized in table 9.

TABLE 9 The number of samples used during Phase I of the study.Comparator Sepsis Pre-op 30 30 Onset Day −1 30 30 Onset Day −2 21 21Onset Day −3 15 15

The Illumina® Human v4 chip contains 48,804 probes mapping to over27,000 reference sequence numbers. Each probe is 50 base pairs longproviding a high degree of specificity for each gene. For each sampleglobin-reduced RNA (GlobinClear™, Life Technologies, USA) was preparedfrom total RNA. RNA integrity was measured using a Bioanalyzer 2100(Agilent, USA) and concentration was assessed using a NanoQuant™ (Tecan,USA). cRNA was prepared by amplification and labelling using theIllumina® TotalPrep™ RNA Amplification Kit (Life Technologies) andhybridized to Human HT-12 v4 Beadarrays (Illumina®, USA). The Illumina®HighScanHQ™ then imaged each chip with resulting intensities indicatingthe expression level of each probe's corresponding gene. Backgroundsubtracted data was then generated using GenomeStudio™ Software(Illumina®, USA).

A variety of preliminary or exploratory analyses on the microarray datafor Phase I were undertaken to determine whether:

1. There were any batch processing effects on the data.2. There was a difference between pre- and post-surgical transcriptomes.3. There was a gross difference between the transcriptomes of patientswho went on to develop sepsis and their baseliner comparators.

Batch Effects

3D Principal Component Analyses (PCA) was used to examine whether theday of hybridization of sample had an impact on the transcriptomes ofpatients in the study.

The data indicated that samples hybridized on different days did notsegregate into distinct groups. This suggested that there was no batcheffect amongst the samples according to day of hybridization.

Pre- and Post-Surgical Transcriptomes

3D PCA was also used to indicate whether there were any differences inthe transcriptomes of pre- and post-surgery patients. The analysisindicated that the transcriptomes of pre-surgery patients clustertogether. This suggests that they are more similar to each other than tothe transcriptomes of post-surgery patients. Additionally, thetranscriptomes of the entire post-surgery patient samples cluster awayfrom the pre-surgery transcriptomes, suggesting that they too have morein common with each other than with the transcriptomes of pre-surgerypatients.

Differences Between Patients Who go on to Develop Sepsis and theirComparators

Like PCA, Hierarchical Clustering is a tool used for unsupervisedanalysis of data sets. It was used to describe the transcriptomes ofboth patient groups through use of a heat map. Hierarchical clusteringinvolves the re-ordering of genes in the dataset so that similartranscriptome patterns (expression profiles) are put next to each other.In effect it is a tool that helps identify samples that are related toeach other.

Preliminary inspection of the heat map indicated that the pre-surgerysamples as well as the transcriptomes of baseliner patients oncomparative Days −1, −2 and −3 are clustered near each other, generallyat the top half of the heat map. In contrast the transcriptomes ofpatients who go on to develop on Days −1, −2 and −3 seem to cluster neareach other near the bottom of the heat map. This suggests that there isa difference in the transcriptomes of patients who go on to developsepsis and their baseliner comparators.

Following the collection of transcriptomic data from 192 samples,further analysis was required to elucidate key biomarkers whoseexpression was significantly different between the two patient groups.These host response genes would form the basis of a biomarker signaturethat could be used to indicate an individual who was likely to developthe symptoms associated with life-threatening disease.

Biomarker Discovery—Microarray (Phase I) Data Pre-Processing

There were three main steps in the data pre-processing:

1. Log transform—a log_(e) transform was performed on the transcriptomicdata to comply with assumptions of normality required for furtheranalysis2. Pre-surgery subtraction—in order to obtain the log expression foreach sample due to the response to surgery, all samples were normalisedto the difference compared with pre-surgery expression levels.3. Median subtraction—This was important within each gene probe toaccount for systematic variation.

Multiple Hypothesis Testing for Determination of Genes of Interest

We used multiple t-tests to discern evidence for significant differencesin gene expression (below the threshold p-value assigned), for the 3days before sepsis diagnosis. The analyses indicated that 452 genes weresignificantly different between the two patient groups on all 3 daysbefore sepsis diagnosis. We also determined that there was evidence forsignificant differences between the two groups on each day before sepsisdiagnosis. The expression of 91, 1022 and 938 genes had evidence forsignificant differential on Days 3, 2 and 1 before sepsis diagnosis,respectively.

We then took a similar approach implementing the significance analysisof microarray (SAM) analysis method (Tusher V G, Tibshirani R, Chu, X.2001. Significance analysis of microarrays applied to the ionizingradiation response. Proc Nat Am Sci 98:5116-5121) as published by R.Tibshiriani at Stanford University. This method is commonly used formicroarray analysis. We felt this alternative was worth exploring sincethey were likely to provide an independent validation of the firstfindings and therefore confidence in the eventual selection ofbiomarkers for pre-symptomatic diagnosis.

Expression Analysis and Subsequent SAM

Expression analysis was used as a test for the difference in geneexpression between groups of subjects based on a known responsevariable, such as the onset of sepsis. Response variables were generatedfor 4 different tests, defined using the patient groups in Table 10.

TABLE 10 Patient categories used for expression analysis ComparatorSepsis Onset Day −1 B1 S1 Onset Day −2 B2 S2 Onset Day −3 B3 S3

The four tests were:

1. S1+S2+S3 vs. B1+B2+B3

2. S1 vs. B1+132+133

3. S2 vs. B1+132+133

4. S3 vs. B1+132+133

The SAM package in the R statistical language software was used toperform the expression analysis for each of the 4 tests described above.For each gene i an expression statistic d is calculated from the averagedifference in the expression between the two response groups. Thisaverage different r is scaled by the standard deviation s, according tothe following equation:

${{d_{i} = \frac{r_{i}}{s_{i} + s_{0}}};{i = 1}},2,{\ldots \mspace{14mu} p}$

This statistic has a natural ordering based on magnitude as it measuresthe strength of the relationship between gene expression and theresponse variable.

In order to determine which genes are significantly expressed, SAM usespermutation analysis to estimate the local false discovery rate (FDR) ata variety of different test statistic thresholds (delta).

The FDR is fixed at 1% for each test to ensure a consistent risk offalsely identifying significant genes. However, the change in FDR as thethreshold changes is dependent on the distribution of expressionstatistics, and there is often a minimum FDR for any given range ofDelta.

For example, Table 11 shows the estimated false discovery rate for thediagnosis of sepsis 2 days prior to onset of symptoms.

TABLE 11 90^(th) Percentile for the estimated false discovery rate forrange of delta values for sepsis at Day −2. number of 90th delta genescalled % FDR 1.4 158 0.015927 1.41 145 0.016795 1.42 139 0.01752 1.43129 0.013215 1.44 123 0.0132 1.45 118 0.013759 1.46 114 0.007833 1.47109 0.007448 1.48 86 0.009439 1.49 75 0.010824 1.5 72 0.011275

The 90th percentile is used as an upper bound on the likely falsediscovery rate (FDR). A FDR of 1% (0.01) was deemed an acceptable riskbut it is clear from the above table that this increases again as weincrease the delta beyond 1.47. Since this also satisfies the FDR<1%delta of 1.47 was chosen to identify 109 significant genes in total forthis diagnosis.

As a consequence of this approach we identified 458 genes whoseexpression was different between the 2 patient groups for all 3 daysprior to the onset of sepsis. In addition the expression of 167, 179 and226 genes was found to be specifically differentially expressed betweenthe patient groups on Days −3, −2 and −1, respectively. Unique to thistest, were 163 of the total number of genes, 18 for Day−1, 12 for Day−2,and 51 for Day−3.

Models

Any biomarkers selected for further validation must be mathematicallymodelled so that their performance can be assessed both qualitativelyand quantitatively. It is however important to determine a useful modelby:

ensuring any assumptions are fit for the purpose of the analysis,

determining precedent for the choice of model, unless the analysis is anew approach,

undertaking an appropriate sensitivity analysis to determine thelimitations of the model,

correlating the model itself with scientific rationale.

Within the field of biostatistics and bioinformatics, there are manyanalysis pathways and algorithms (or models) available. It would beimpossible to use all of these approaches to help select and validatethe most appropriate biomarkers for pre-symptomatic diagnosis of sepsis.In the context of this project the criteria for the analyses used isdescribed in Table 12, where a number of approaches are graduallydiscounted due to likely model requirements.

TABLE 12 Down-selection of models used for biomarker selection andanalysis. Model Requirements Potential Models Data are non linear Kernalbased PCA, Support Vector Machines (shown in Lukaszewski et al. 2008)(SVM), Quadratic regression, Decision Trees, Random Forests, ArtificialNeural Networks (ANN), Quadratic Discriminant Analysis (QDA), NaiveBayes classifier, K-Nearest Neighbour Analysis (KNN) and Factoranalysis. Solution needs to be resolved quickly Kernal based PCA, SVM,Quadratic regression, Random Forests, ANN, QDA, Bayes classifier, KNN,factor analysis. Due to potential use, model needs to learn and Fromlist above, quadratic regression, factor adapt to new variation in data.analysis and KNN will not fit this criterion. Provide a classification.PCA will not provide a classification algorithm, generally used forexploratory analysis. What's left? SVM, Decision Tress, Random Forests,ANN, QDA and Bayes classifier

Several models were generated to determine the best fit.

Analysis 1

Support vector machines (SVMs), Random Forests and Differential analysiswere used to identify genes for down selection for targeted qRT-PCR onthe Fluidigm array. Survival analysis, which makes use of longitudinalinformation, was also used. All analysis was carried via R 2.14.0 andrelevant R packages.

SVMs and random forests are supervised machine learning algorithmscommonly used as bioinformatic tools. The ease of variable (gene)selection provided by these methods was a key factor in adoption of themethods. A SVM uses observations to find a hyper-plane that bestseparates two labelled groups. The Random Forest algorithm is anensemble classifier, which uses bagging to create many independentclassification trees. Each tree has its own training dataset, a subsetof original observations is approximately 66% of the samples, with theremaining samples used to determine that tree's accuracy. Eachclassification tree was created using a random subset of variablesallowing genes to be ranked based on a measure of how strongly theyinfluence tree accuracies called the mean Gini coefficient. Randomforests are probabilistic classifiers yielding a value between 0 and 1indicating the probability that a given sample belongs to a particularclass.

Survival analysis was also employed to find probes that play a role inthe development of sepsis. The method's main attraction is that itallows microarray data from different days to be incorporated in to themodel whereas the machine learning approaches use only one point to findimportant genes. However, the technique was not developed for predictionand creates a separate model for each gene. Similar to the t-statisticfrom standard differential expression, a test statistic is computed foreach gene that is then used to rank genes.

The SVM, Random Forest, differential expression, and survival analysisapproaches showed significant overlap in gene selection when analysingPhase I microarray data, as detailed in Table 13. The top 531 genesprior to sepsis found by random forest and SVM and survival analysisusing all post operation time samples overlapped greatly with genesfound by differentially expressed genes. All overlaps were highlysignificant (p-value<0.001) and the numbers of overlapping genes aregiven in Table 13.

TABLE 13 Differential expression analysis of expressed genes -overlapping genes between different models. Survival Differential MethodSVM Analysis Expression Random Forest 154 140 59 SVM 255 98 SurvivalAnalysis 91

Prediction rates using this data were then calculated through Randomforests and support vector machines (SVMs). Individual days (pre-op,day−1, etc.) were split into sepsis and control. For day−1, day−2, andday−3 predictions were made with pre-op normalization by division, bysubtraction, and without normalization. Averages across days were alsoconsidered, for example day−1 and day−2 averaged for each patient. Inorder to maintain the assumption that samples were independent, no dayswere grouped together into a meta group (in either sepsis or control).For survival analysis all data were used under the false assumption thateach time point was equally spaced (time between pre-op and day−3 wasvariable).

Random Forest Prediction of Sepsis

Random forests are composed of many simple tree classifiers, each basedon a different random subset of samples for training and testing eachtree (70% vs. 30%) thus allowing for accurate estimates of error rates.Below are the sensitivity and specificity for predicting sepsis in eachtime grouping. Note that Day−2 and Day−3 (D−2 and D−3) have smallersample sizes. Normalizing by pre-op (by division of unlogged data) andcombining days showed that averaging Days−1 and 2 yields the mostaccurate results. Normalization by subtraction (not shown) performed nobetter than normalization by division.

TABLE 14 Performance of identified genes using Random Forests FilteredSensitivity Specificity Error Rate Pre-op 0.667 0.643 0.345 D−3 0.7860.8 0.207 D−2 0.95 0.857 0.0976 D−1 0.778 0.786 0.218

To provide a comparison to random forests a Support Vector Machine (SVM)with a Wilcoxon test to allow for non-normally distributed probes wasemployed (Table 15). We concentrated on the Day−1 and Day−2 averagegiven that this performed the best and used 5 fold cross validationusing 20% for testing. Standard errors are shown in parentheses.

TABLE 15 Performance of identified genes using Support Vector Machines.Sensitivity Specificity Error Rate D−1n2 ave 0.8 (0.082) 0.69 (0.027)0.253 (0.044) D−1n2 ave Filtered 0.853 (0.037) 0.807 (0.11) 0.164 (0.6)

Both approaches demonstrated acceptable, but not outstanding,differentiations between the two patient groups. This suggested thatother techniques may be useful when trying to model these datasets.

Analysis 2

Artificial Neural networks (ANNs) provide the ability to predict classesof data given an unknown pattern in a set of example data, and have beenused successfully in a pilot study. The neural network analysis isdescribed by the following process. This was performed separately 5times to show possible changes in predictive ability.

1. Gene expression data was identified based on SAM analysis for eachseparate test, all sepsis, sepsis Day −1, −2 and −3. This data wasnormalized by subtracting the median and scaling by the standarddeviation for each gene.2. Normalized data was split into 70% subset used for training and 30%used for validating the neural network3. The neural network was trained and weights for each hidden unit areused to form a predictor for new data.4. The 30% subset was passed through the predictor and a probability isgiven for assignment to each of the two groups. (Sepsis, Non-sepsis)5. The predictive ability of the neural network was based on specificityand sensitivity estimated from the 30% unknown data set.

An average specificity and sensitivity was then gained from the fiveseparate neural networks and the results are summarized in Table 16.

TABLE 16 Summary results for prediction of sepsis on different days withintervals based on standard error of the five repeated predictors, asbased on the genes identified by SAM analysis. Standard StandardDeviation Deviation Test Sensitivity (+/−) Specificity (+/−) sepsis onany day 89.7% 7.4% 89.4% 7.2% sepsis on Day −1 70.8% 11.8% 91.6% 3.3%sepsis on Day −2 73.4% 12.7% 93.5% 5.4% sepsis on Day −3 72.8% 13.0%94.7% 6.5%

Neural network analysis is restricted by the number of samples which canbe used to estimate the sensitivity/specificity since new data must bepassed through the predictor. We fully accept that with this data set,other classification techniques may be able to provide more accurateresults based on larger number of patients. However, it did performbetter than the techniques used in Analysis 1.

Having regard to the 458 genes identified in the SAM analysis, and dueto the natural ordering of the magnitude of the test statistic, it waspossible to select genes from the top of this list in order to find asmaller subset with a similar predictive ability (data not shown).

As a consequence of the different analyses conducted in Analysis 1 andAnalysis 2, a down select of biomarkers was conducted. Those biomarkersidentified as most predictive by any of the techniques outlined abovewere selected to be taken forward for further analysis using theFluidigm q RT-PCR array system. In total 270 genes were selected alongwith 6 housekeeping genes (BRD7, PWWP2A, RANBP3, TERF2, SCMH1, FAM105B)selected based on consistent expression across all samples.

Fluidigm Confirmation and Quantitation of Microarray Biomarkers—

The Fluidigm BioMarkHD was used to profile 270 genes in the 60 phase Iand 80 phase II samples taken at all time points. The BioMarkHD™ is aqPCR assay that runs 96 primer-probe pairs in 96 samples. Specifically,globin-reduced RNA (GlobinClear™, Life Technologies, USA) was convertedto cDNA (High Capacity RT kit, Life Technologies, USA) and preamplifiedby limited PCR (PreAmp™ Master Mix, Life Technologies, USA) with a poolof primers (DeltaGene, Fluidigm, USA) for all assays of interest (inthis case, a pool of 276 assay primer pairs). Preamplified cDNAs weretreated with Exonuclease I (New England Biolabs, USA) and diluted toremove unused primers and dNTPs and to prepare samples for qPCR.Preamplified samples were combined with 2× SsoFast EvaGreen Supermixwith Low ROX (Bio-Rad, USA) and 20× DNA Binding Dye Sample LoadingReagent (Fluidigm, USA) and assays (primer pairs) were combined with 2×Assay Loading Reagent (Fluidigm, USA). Samples and Assay mixes wereloaded onto a 96 by 96 Dynamic Array IFC for real-time PCR analysis on aBioMarkHD™ (Fluidigm, USA). Pre-processing was performed, usingFluidigm's Real Time PCR Analysis Software to determine cycle threshold(Ct) values, using linear (derivative) baseline correction andauto-detected, assay-specific threshold determination. Reference orhousekeeping genes are used to normalize each assay and produce delta Ctvalues. Reference samples are then used to normalize all samples on theplate with resulting values referred to as a delta-delta Ct's. Three 96by 96 plates were used to profile all 270 genes plus 6 housekeepinggenes on each plate.

A total of 269 (the original Phase I 192+extra samples from thecomparator group) samples from the 60 phase I patients and 439 samplesfrom the 80 Phase II patients (+40 SIRS patient samples) were profiledvia the Fluidigm BioMark™. It should be noted that data from Phase I ofthe study was first analysed. The initial analysis used SVM to assessthe performance of the down-selected biomarker list. The array was verygood at identifying non-sepsis comparator patients, as detailed in Table17. However, its performance for positive identification of sepsispatients diminished with time from sepsis diagnosis. Furthermore, whenthe data for the three days prior to sepsis is pooled, the array gave anoverall predictive accuracy of 78.8%.

TABLE 17 Performance (%) of down-selected biomarkers in prediction ofpre-symptomatic sepsis patients and their comparators on Phase I datausing the Fluidigm array Comparator Sepsis No. of % Predictive No. of %Predictive patients Accuracy patients Accuracy DAY −1 29 100 30 90 DAY−2 25 100 21 71.4 DAY −3 22 100 15 66.67 DAY −4 20 95 11 54.5

Optimising the Biomarker Signature

Due to the performance of the classifiers within Phase 1, it was decidedthat the biomarker list required updating in the light of new knowledge.The results from Phase 1 enabled down selection of the gene list basedon differential analysis.

Biomarker Validation with Phase II Samples—Blind Testing withIndependent Data Sets.

A fresh set of patient samples were obtained over the course of thestudy and used to validate the down-selected genes from Phase I. All RNAsamples were prepared, blinded and sent for analysis using microarrayand Fluidigm array analysis. 266 genes were determined through severalmethods such as using SAM analysis, as mentioned above. This gene setwas then further reduced through the use of measures taken from theclassifying algorithms used, such as the Gini coefficient in the randomforest classifier.

Two groups of classifiers were then selected, one with 45 genes, and theother 25 genes. These are indicated in Table 18.

TABLE 18 45 and 25 gene classifiers whose predictive accuracy was testedin Phase II of the study using microarray and Fluidigm array analysis of433 blinded RNA samples. 45 Gene Classifier 25 Gene Classifier ATP9A,C16ORF7, C5ORF39, C9ORF103, C16ORF7, C5ORF39, C9ORF103, CD177, CACNA1E,CD177, DHRS3, EEF1B2, FCER1A, FCER1A, GAS7, LOC285176, MYBPC3, NDST2,FLT3LG, GAS7, GRB10, HLA.DMA, HS.445036, EBI2, RPL13A, RPL18A, RPL32,RPL36, RPL9, IL1R1, IL1R2, LOC285176, MYBPC3, NCOA3, RPS20, RPS29, RPS6,SIGIRR, TCEA3, TCTN1, NDST2, RPL10A, EBI2, LOC646483, RPL13A, TIMM9,TOMM7, ZFAND1, ZNHIT3 RPL18, RPL18A, RPL32, RPL36, RPL9, RPS20, RPS29,RPS6, SIGIRR, SLBP, SLC26A6, SMPDL3A, SORBS3, TCEA3, TCTN1, THBS3,THNSL1, TIMM9, TOMM7, ZFAND1, ZNHIT3

We made predictions as to the type of patient that each sample had comefrom and sent them to be unblended. The performance of these classifiersis summarised in Table 19.

TABLE 19 Performance (%) of biomarker classifiers - (A) predictiveaccuracies for samples from Comparator patients, (B) predictiveaccuracies for samples from pre-symptomatic sepsis patients EquivalentNumber Day Pre- Microarray Microarray Fluidigm Fluidigm of SEPSISSepClass SepClass SepClass SepClass patients Diagnosis 25Genes 45Genes25Genes 45Genes A CONTROL 34 DAY −1 91.2 91.2 61.8 82.4 CONTROL 30 DAY−2 86.7 86.7 60.0 76.7 CONTROL 27 DAY −3 88.9 88.9 63.0 74.1 CONTROL 22DAY −4 81.8 81.8 63.6 72.7 B SEPSIS 37 DAY −1 64.9 64.9 91.9 75.7 SEPSIS31 DAY −2 71.0 71.0 90.3 80.6 SEPSIS 28 DAY −3 67.9 67.9 92.9 78.6SEPSIS 21 DAY −4 81.0 81.0 95.2 76.2

Table 19 demonstrates that the analysis undertaken can classify to agiven level between sepsis patients and non-sepsis patients and theircomparators. The tables also show that the further away from the day ofsepsis diagnosis and as the N reduces, the classifier performanceincreases.

Biomarker Assessment—

Following the decision to remove some of the biomarkers from theoriginal Fluidigm array, we re-constituted 31.5% of them with morecandidate biomarkers identified for Phase II with more candidates. Thisfinal list of biomarkers still held the 180 highly significant genesidentified and used for Fluidigm validation during Phase I. Additionalcandidate biomarkers identified by SAM analysis, were added to increasethe likelihood of finding key pre-symptomatic biomarkers. This thenenabled the gene list to keep what was determined as optimum genes andadd in other genes into the list that the differential analysis showedas significant. This final list is listed in Table 20.

TABLE 20 Final Down-Selected genes for use on Fluidigm array duringPhase II ACTR6 EBI2 CXORF42 SORBS3 RPL11 SLC26A8 ATP2A2 BIN1 GAS7 CLASP1TIMM9 PPP2R2B WDR37 ZNF608 C16ORF7 HIST2H4B CD2 TST NOL11 ZNF17 TBC1D8CD247 IL1R1 C14ORF112 CCDC65 GZMK ANKS1A RRBP1 CLNS1A LGALS2 BCL6 NCOA3ZNF32 CD59 RPL26 CYB561 LTA MRPL24 PDCD4 TMEM42 EIF3D PHCA FCER1A EEF1B2LOC646483 RASGRP1 TCEA3 GYG1 NSUN7 GRB10 CTSS KLRG1 RPL18A SLC2A11 KIF1BLETMD1 HS.445036 CD7 HLA-DRA RPS14 SERTAD2 MMP9 IRAK3 LARP5 CACNA1EGRAMD4 RPS6 RPS20 PAG1 FAM160A2 LOC646766 C12ORF57 MRPS6 SIVA RPL38RPL19 CTDP1 MRPL50 AOC2 OLFML2B SS18L2 RPL12 RPS15 ATP8B4 ADRB2 LY6EPTPRCAP TMC6 PRKCQ SLC36A1 RPS3A BOAT LOC285176 RPL13 TTLL3 OLFM1 WWP1TDRD9 C21ORF7 IL1R2 RPL7A CDO1 HLA-DRB3 ARG1 RUNX1 CD3D HLA-DMA RPS27RPSA ZNF430 CKAP4 RPL27A CPA3 GBP1 SH2D1A RPS15A TOMM7 EMILIN2 PHTF1DHRS3 EOMES SMAD2 RPL30 TCTN1 HIBADH NT5DC2 FLT3LG CUTL1 THBS3 RCN2SLC38A10 MUC1 LOC153561 GTPBP8 CD96 TP53BP2 PECI ACVR1B PFKFB2 ITGAMICAM2 CCL5 ZNHIT3 NDST2 C13ORF23 RPL22 FBXO34 LDHA C12ORF62 LEPROTL1EFCBP1 DACH1 RPS25 CYP1B1 LOC652071 ASNSD1 MS4A4A ZFAND1 FBXW2 SLC41A3ATXN7L3 MRPS27 MAFG P117 TMEM150 ITGAX ZC3H3 TRPM2 AKR1B1 LOC644096PYHIN1 SSBP2 LOC647099 NAPB RPL4 BTBD11 IL32 RPL13A SLBP OPLAH LARP4BPLAC8 C5ORF39 HLA-DMB RPL9 RTP4 PTPN1 HIPK2 CD3E GBP4 RPS29 RPS17 RPL5EXOC7 CR1 EXOSC5 SIGIRR RPL32 SIL1 CMTM4 DIP2A CXORF20 SMPDL3A RPL10AUPP1 ARID5B GALM CDKN2AIP THNSL1 POP5 TFB1M ZDHHC19 HDC CD177 TRAT1 NMT2AMD1 SORT1 ICOS C12ORF65 OSTALPHA FAM26F C22ORF9 RPS8 LDOC1 ATP9A MYBPC3ZNF195 DNAJC5 RPL24 LSG1 METTL7B P2RY5 TMEM204 GOLGA1 PGD AMPH LOC646200RARRES3 TBCC KIAA1881 NLRC4 C11ORF1 ITM2A RPL18 SLC26A6 MACF1 LDLRC9ORF103 HLA-DPA1 RPS10 SELM P4HB HK3 CD6 GPR107 RPS5 RPS18 RPL15 EXT1CRIP2 FAM69A SIRPG RPL36 RPS13 CSGALNACT2

In order to understand why these candidate biomarkers are indicative ofsepsis in the two patient populations, the pathways and networksaffected by changes in the expression of these down-selected genes wereanalysed using GeneGo software. The complement, epithelial tomesenchymal and cytoskeletal remodelling pathways had the highestproportion of genes that were over-expressed of all host pathways at 1day prior to sepsis diagnosis. In contrast, pathways associated withimmune cell and G protein signalling had the highest proportion ofdown-regulated genes of host response pathways at 1 day prior to sepsisdiagnosis. The inflammatory, apoptosis and cell adhesion networks weremost up-regulated in the healthy comparator patients, and consequentlymost down-regulated in the sepsis patient group. A similar pattern wasobserved for the networks controlling protein translation, antigenpresentation and T cell receptor signalling.

q RT-PCR Validation of Highlighted Biomarkers—Blind Testing withIndependent Data Set (Phase II Samples).

Given the performance of the first down-selected biomarker signatures,it was decided that we would also down-select a second set of biomarkersthat would give good predictive accuracies with lower numbers of genes.In order to compare with the first set 2 pre-symptomatic biomarkerclassifiers consisting of 44 and 25 gene were down-selected by takingrandom samples of the 266 gene list and determining the predictiveaccuracy of the ANN. The 44 and 25 gene listed in Table 21 were the genelists that enabled the highest value for predictive accuracy.

TABLE 21 44 and 25 gene classifiers whose predictive accuracy was testedin Phase II of the study using Fluidigm array analysis of 433 blindedRNA samples 44 Gene Classifier 25 Gene Classifier ACTR6, BIN1, C16ORF7,CD247, CLNS1A, ACTR6, BIN1, C16ORF7, CD247, CLNS1A, CYB561, FCER1A,GRB10, HS.445036, LARP5, CYB561, FCER1A, GRB10, HS.445036, LARP5,LOC646766, MRPL50, ADRB2, BOAT, C21ORF7 LOC646766, MRPL50, ADRB2, BOAT,C21ORF7 CD3D, CPA3, DHRS3, FLT3LG, GTPBP8, ICAM2, CD3D, CPA3, DHRS3,FLT3LG, GTPBP8, ICAM2, LDHA, LOC652071, MRPS27, AKR1B1, BTBD11, LDHA,LOC652071, MRPS27, AKR1B1 C5ORF39, CD3E, CR1, DIP2A, GALM, HDC, ICOS,LDOC1, LSG1, AMPH, C11ORF1, C9ORF103, CD6, CRIP2, EBI2, GAS7, HIST2H4B,IL1R1

Exploratory Analysis

Principal component analysis (PCA) was performed for sepsis, SIRS andcomparator patient data using 266 genes on the validation cohort. Theseparation of the three groups of patients allowed further analysis tobe undertaken as it demonstrated that there was a separation between thegroups to be found by the classification algorithms. This separation wasmade more noticeable when PCA analysis, using the Dstl 44 geneclassifier, was used on the validation cohort.

ANN Results

The ANN approach undertaken has already been described as part of thePhase I work. The training and testing (70:30) was undertaken usingdata/samples taken from 70 sepsis patients and 70 comparators(combination of phase 1 and phase 2 patients), at different time pointscorresponding to pre-operation, and 1, 2, and 3 days prior to the onsetof sepsis, and thus using over 600 samples. Summary results forprediction of sepsis on different days with intervals based on standarderror of the five repeated predictors are shown in Table 22, using theartificial neural network detailed in Table 2.

TABLE 22 Summary results for prediction of sepsis through use of an ANN.Test Standard Standard Standard (no. of Predictive Deviation DeviationDeviation genes) Accuracy (+/−) Sensitivity (+/−) Specificity (+/−) 26695.2% 3.9303% 4.7124% 5.3014% 5.5714% 3.0000% 44 97.24% 1.4046% 1.7348%1.0897% 4.4074% 2.4856% 25 92.00% 3.7202% 9.3839% 8.7481% 8.2601%4.2226%

The results demonstrate that the ANN can classify sepsis and non-sepsispatients to a high degree of confidence. This confidence shows littlevariation when reducing the number of genes in the classifier, forexample to 25. The results also suggest that there is an optimal numberof genes on which to classify.

Neural Network Analysis—SIRS

A potential confounder for these results is the possibility that thebiomarker signature of patients who develop SIRS but NOT sepsis issimilar to or overlaps with those for sepsis patients. This could giverise to false positives and undermine the predictive value of thepre-symptomatic biomarker signature for sepsis. This would lead to alack of confidence in the findings. For this reason, data for 40 SIRSpatients were run through the ANN against the sepsis biomarkersignature, the results of which are shown in Table 23.

TABLE 23 Summary results for prediction of sepsis vs. SIRS on differentdays with intervals, based on standard error of the five repeatedpredictors Standard Standard Test Standard Deviation Deviation (no. ofPredictive Deviation (+/−) (+/−) genes) Accuracy (+/−) Sensitivity (+/−)Specificity (+/−) 266 95.22% 10.25% 0.2 0.447214 0.002 0.004472 44100.00% 0.00% 0 0 0 0 25 99.84% 0.36% 0 0 0.001961 0.004384

The results in Table 23 show that the ANN effectively classifies betweensepsis and SIRS patient biomarker signatures. Again as with Table 22,there appears to be an optimum number of genes for the classification.Furthermore the difference between results for 263 genes and 45 genessuggests that the 263 gene biomarker list does have a commonality withthe SIRS signature within it.

Through the production of 44,14 combinations/biomarker signatures of 44biomarkers, randomly selected from the list of 266, it has been shownthat all combinations have a mean predictive accuracy of greater than75% (in fact above 76.1%). The abundance of individual genes in the topand bottom 1000 subsets is not uniform; the genes which appear morefrequently in the top subsets of 44, appear less frequently in thebottom subsets, and vice versa.

These results are illustrated by the 15 specific combinations listed inTable 24, which have the accuracies shown in FIG. 3. Thus in oneembodiment the biomarker signature comprises at least 44 genes selectedfrom the list of genes consisting of the 266 genes listed in Table 1.

TABLE 24 Fifteen combinations of 44 biomarkers tested for predictiveaccuracies. The predictive accuracies are illustrated in FIG. 3. 1 2 3 45 6 7 8 CYB561 BIN1 ACTR6 BOAT ACTR6 BIN1 C16ORF7 CYB561 GRB10 FCER1ABINI LOC652071 BINI ADRB2 LARP5 FCER1A BTBD11 LOC646766 LOC646766 DIP2AC16ORF7 BOAT C21ORF7 MRPL50 CD3E MRPL50 ICAM2 C11ORF1 CD247 FLT3LGGTPBP8 ADRB2 EBI2 ADRB2 LOC652071 CD6 CLNS1A LOC652071 LDHA LOC652071CD7 CD3E ICOS HIST2H4B CYB561 CR1 MRPS27 ICOS LOC285176 CACNA1E CD7EEF1B2 FCER1A DIP2A BTBD11 AMPH HLA-DMA AOC2 IL1R2 C12ORF57 GRB10 LY6EHDC C90RF103 C12ORF62 LY6E ASNSD1 IL1R2 HS.445036 HLA-DMA CRIP2 C12ORF57ASNSD1 GBP4 MAFG CXORF20 LARP5 GBP1 IL1R1 AOC2 GPR107 CDKN2AIP GBP4CD177 LOC646766 MAFG CACNA1E HLA-DMA BCL6 METTL7B CXORF20 CD2 MRPL50LOC644096 LOC285176 CDKN2AIP MRPL24 HLA-DPA1 HLA-DPA1 BCL6 ADRB2 ATP9AHLA-DMA C12ORF65 RPL7A LOC646483 CXORF42 LOC646483 BOAT CLASP1 ASNSD1ATP9A RPL13A MRPS6 MRPL24 HLA-DRA C21ORF7 BCL6 LOC644096 LOC646200 RPS5OLFML2B PTPRCAP PTPRCAP CD3D RPL7A IL32 HLA-DPA1 CCDC65 SH2D1A RPL7ATP53BP2 CPA3 THBS3 FAM69A LOC646483 NCOA3 SIRPG PYHIN1 SORBS3 DHRS3P2RY5 MRPL24 RPL7A RASGRP1 RPS14 RASGRP1 NCOA3 FLT3LG RPS5 HLA-DRATP53BP2 RPS6 RPS15A RPL30 RPS6 GTPBP8 RASGRP1 MRPS6 ZNHIT3 NMT2 RCN2EFCBP1 SS18L2 ICAM2 RPS14 RPS27 RPL13A ZNF32 EFCBP1 TMEM150 RPS15A LDHARPSA PYHIN1 TRAT1 SERTAD2 NMT2 RPL32 ZFAND1 LOC652071 RPS17 SIGIRR P2RY5RPL38 NOL11 ZNF195 RPS18 MRPS27 NMT2 SMPDL3A RARRES3 SLC38A10 ZNF32TMEM204 NOL11 AKR1B1 SLC26A6 P2RY5 CCDC65 ACVR1B SLC2A11 SELM TMEM42BTBD11 DACH1 RARRES3 RPS6 P4HB OLFM1 RPS18 RPL12 C5ORF39 RPL5 TTLL3 RPSASLC26A8 LOC647099 PPP2R2B ZNF430 CD3E TFB1M RPS15A RCN2 WDR37 OPLAHOLFM1 SLC38A10 CR1 AMD1 RPL36 RPL32 PAG1 ZNF17 TCTN1 ACVR1B DIP2A MACF1TMEM42 RPL36 RPL19 KIF1B DACH1 DACH1 GALM KIF1B HLA-DRB3 RPL12 SLC41A3RPS15 ITGAX UPP1 HDC SLC36A1 FBXW2 OLFM1 LARP4B MUC1 TFB1M GOLGA1 ICOSWWP1 LOC647099 ZNF430 ZDHHC19 PFKFB2 GYG1 MACF1 LDOC1 PFKFB2 ZNF17 TOMM7SORT1 NAPB MMP9 P4HB LSG1 RPS25 CD59 GOLGA1 NLRC4 LDLR PAG1 SLC36A1 AMPHCMTM4 KIF1B ZNF17 EXT1 ATP2A2 RPS15 CKAP4 C11ORF1 RPS8 SLC36A1 CD59ATP2A2 RRBP1 CKAP4 PFKFB2 C9ORF103 PGD PFKFB2 PAG1 ZNF608 IRAK3 RPL22ZC3H3 CD6 TBC1D8 SLC41A3 RPL19 RRBP1 ATP8B4 ZDHHC19 CMTM4 CRIP2 LETMD1EXOC7 ZC3H3 TDRD9 PHTF1 SORT1 ZDHHC19 EBI2 IRAK3 HK3 NAPB RUNX1 NT5DC2NLRC4 NLRC4 GAS7 RPS3A ATP2A2 HIPK2 LOC153561 LOC153561 CSGALNACT2 PHTF1HIST2H4B PHTF1 LETMD1 ZNF608 ITGAM FBXO34 RRBP1 TRPM2 IL1R1 NT5DC2 ITGAMPHCA 9 10 11 12 13 14 15 CD6 BCL6 CUTL1 CDKN2AIP LOC652071 EBI2 AOC2CD247 CLNS1A GTPBP8 CLNS1A BIN1 CD247 LOC646766 CLNS1A CYB561 LDHA ADRB2CYB561 FCER1A BOAT C5ORF39 FCER1A BTBD11 FLT3LG GRB10 ICAM2 DIP2A GALMC21ORF7 ICOS GTPBP8 LARP5 C5ORF39 CRIP2 ICOS FLT3LG C11ORF1 IL1R1LOC646766 EEF1B2 CUTL1 AOC2 CTSS CD6 CD7 AKR1B1 C12ORF57 CD96 IL1R2 CD96LGALS2 CACNA1E DIP2A AOC2 HLA-DMB CUTL1 CCL5 GBP1 GBP4 LSG1 EOMES EXOSC5CDKN2AIP HLA-DMB ASNSD1 EXOSC5 EBI2 IL32 ITM2A ITM2A CDKN2AIP CDKN2AIPCXORF20 EEF1B2 GBP4 HLA-DPA1 CLASP1 GPR107 ITM2A CD177 CD7 CD177 BCL6C14ORF112 CXORF42 C14ORF112 METTL7B LY6E HLA-DPA1 MRPL24 BCL6 CLASP1BCL6 HLA-DPA1 C12ORF65 C14ORF112 LOC646483 LOC646483 RPS27 HLA-DRA CD2METTL7B BCL6 KLRG1 RPS27 SMAD2 RPS29 RPS27 ITM2A MRPL24 GRAMD4 P117ZNHIT3 OSTALPHA TP53BP2 LOC646483 RPS27 PYHIN1 RPL9 RPL13A TST SIGIRRSIGIRR P117 SMPDL3A RPS10 SMPDL3A CCDC65 OSTALPHA TRAT1 NCOA3 RPS5SORBS3 TMEM150 NCOA3 RARRES3 OSTALPHA RASGRP1 RASGRP1 TST FAM26F PDCD4SIRPG NCOA3 RPS6 RPL18A RPL18A TBCC SLBP SORBS3 RPL18A PECI RPSA SS18L2TCEA3 RPL10A RPL30 RPS15A EFCBP1 TMEM150 CDO1 ITGAX GZMK NDST2 RCN2TMEM150 ZNF195 RPS15A PTPN1 RPL12 SLBP RTP4 NMT2 SLC26A6 TMEM150 TFB1MPRKCQ RTP4 SLC2A11 SLC26A6 SLC2A11 RPL32 AMD1 HLA-DRB3 RPL11 HLA-DRB3RPL11 PRKCQ SLC26A6 KIAA1881 OPLAH ZNF32 C13ORF23 PPP2R2B HLA-DRB3 RPS20CD59 UPP1 RPS20 LOC647099 ZNF32 TCTN1 HLA-DRB3 KIF1B KIAA1881 HLA-DRB3OPLAH ACVR1B ITGAX TCTN1 RPL19 P4HB ACVR1B ANKS1A TFB1M AMD1 P4HB NAPBRPS13 C13ORF23 RPL19 P4HB DNAJC5 RPL15 ZDHHC19 WDR37 DACH1 WWP1 RPL15GOLGA1 RPS13 EXT1 ZNF17 AMD1 ARG1 ZNF17 CD59 WDR37 ZNF608 ANKS1A SLC26A8CKAP4 EIF3D RPS15 ANKS1A TBC1D8 CD59 WDR37 EXOC7 MMP9 ARG1 KIF1B RRBP1CKAP4 KIF1B PGD SLC36A1 EMILIN2 MMP9 ATP8B4 PFKFB2 WWP1 HK3 NAPB HIBADHEXOC7 RPS3A RPL22 RPS25 RRBP1 ARID5B MUC1 CMTM4 RPL27A ZC3H3 RPL24 RPL26HK3 ZC3H3 RPL24 PHTF1 ZDHHC19 ZNF608 FAM160A2 CSGALNACT2 ZDHHC19CSGALNACT2 FBXO34 NLRC4 RRBP1 CTDP1 FAM160A2 HK3 ATP2A2 CYP1B1 RRBP1NSUN7 RPL27A RPS3A PHCA FAM160A2 RPL4 RPL26 TDRD9 NT5DC2 RPL27A NSUN7

Having regard to FIG. 4, 98 subsets of different numbers of biomarkerswere formed from the entire panel of 266 genes, wherein each subset wascharacterised by only including genes of a particular level of abundancein the top 1000 subsets selected from the 44,14. The subsets range froma single gene (LDLR, which was by far the most abundant) to the wholepanel of 266 biomarkers. Where multiple genes have the same abundancethey are all included in a single subset associated with that threshold,which is why there are less subsets than there are total genes. Thetrend, derived from a more basic ANN than that detailed in Table 2,shows an increase in predictive accuracy with increasing subset size,which begins to plateau at a subset size of 97 genes. Here a maximumpredictive accuracy is reached at 97.0%. When the subset size exceeds149 genes the predictive accuracy begins to drop off, falling to 89.6%.This trend does not match that obtained using a random forest, althoughhere we examine single defined subsets rather than many random ones. Ofthe three subsets reaching the maximum predictive accuracy, the smallestone (subset 50) contains genes that appeared 165 times or more in thetop 1000 subsets of the test.

Through use of systems-scale profiling technology, this study identifiedbiomarker signatures predictive of the development of post-operativesepsis with high accuracy in a sizable blinded validation set. The keyto analysing complicated data sets is the method of analysis. Ourapproach was predicated on the conclusion that no one biomarker islikely to be a predictive for sepsis in humans. Previous studies ofpre-symptomatic biomarker expression have shown that conventional linearanalyses of biomarker expression may fail to reveal differences betweenthe two patient groups i.e. sepsis and non-sepsis patients. A variety ofNon-linear techniques were used with varying degrees of success todifferentiate between the transcriptomes of patients who go on todevelop sepsis and their comparators. Random Forests and SVMdemonstrated some use for the differentiation of transcriptomes fromdifferent patient groups. However, ANN analysis, using 25 and 44 genebiomarker signatures performed excellently. These gave high predictiveaccuracies as well as high sensitivities and specificities whendifferentiating between patients who went on to develop sepsis and theircomparators. Furthermore, the biomarker signatures derived were veryrobust when tested against potentially confounding transcriptomes frompatients who had SIRS but who did not go on to develop sepsis.

The strong performance of non-linear techniques is perhaps notunexpected, since immune markers fluctuate greatly over the entirecourse of sepsis. It is unlikely that analysis using simple lineartechniques could be used as easily to pick out key biomarker signatures.

It is also worth noting that the successful testing of patienttranscriptomes through use of a multiplexed q RT-PCR indicates thesuitability of this technology for further development as a diagnosticassay.

The functional relevance of a subset of transcripts constituting thissignature, which is broadly associated with coordinated molecular andcellular chain of events involved during inflammation and sepsis,instils confidence in our results. Indeed activation of the complementpathway has been shown to play an important role in sepsis andinflammation. Conversely, dendritic cells and other antigen presentingcells have been shown to disappear from the circulation during septicepisodes, which may account for the observed decrease in abundance oftranscripts associated with MHC gene expression. It should be noted thatthe majority of the candidate markers identified through this unbiasedglobal profiling approach have not been as well characterized as thesefew functionally enriched “landmark” transcripts.

1. A diagnostic kit for predicting development of sepsis prior to onsetof symptoms, comprising reagents for detecting nucleic acid geneproducts or expression levels of members of a biomarker signatureconsisting of 266 genes of Table I in a sample obtained from a patientat a risk of sepsis, wherein the reagents comprise one or more offluorescently labelled oligonucleotide probes or fluorescently labeledprimers, wherein the fluorescently labelled oligonucleotide probes orfluorescently labeled primers consist of probes and primers each capableof specific binding and detection of gene products of at least 25 of themembers of the biomarker signature, and wherein the at least 25 of themembers of the biomarker signature comprise C16ORF7, C5ORF39, C9ORF103,CD177, FCER1A, GAS7, LOC285176, MYBPC3, NDST2, EBI2, RPL13A, RPL18A,RPL32, RPL36, RPL9, RPS20, RPS29, RPS6, SIGIRR, TCEA3, TCTN1, TIMM9,TOMM7, ZFAND1 and ZNHIT3.
 2. The kit of claim 1, wherein the nucleicacid gene products are transcribed ribonucleic acids or cDNA.
 3. The kitof claim 1, wherein the patient is a post-surgical patient, animmunocompromised individual, an intensive-care patient or a burnpatient.
 4. The kit of claim 1, wherein the at least 25 of the membersof the biomarker signature further comprise ATP9A, CACNA1E, DHRS3,EEF1B2, FLT3LG, GRB10, HLA.DMA, HS.445036, IL1R1, IL1R2, NCOA3, RPL10A,LOC646483, RPL18, SLBP, SLC26A6, SMPDL3A, SORBS3, THBS3 and THNSL1. 5.The kit of claim 1, wherein the at least 25 of the members of thebiomarker signature consist of all 266 genes of Table
 1. 6. The kit ofclaim 1, wherein the reagents comprise the fluorescently labelledoligonucleotide probes.
 7. The kit of claim 1, wherein the reagentscomprise the fluorescently labelled primers.
 8. A system for analysis ofa biological sample obtained from a patient at risk of sepsis to predictor monitor development of sepsis in the patient, the system comprising:a detector for monitoring, measuring or detecting the nucleic acid geneproducts or the expression levels of the members of a biomarkersignature; the kit of claim 1; and a computer processor configured toanalyze data produced by the detector, and to provide an output.
 9. Adiagnostic kit for predicting development of sepsis prior to onset ofsymptoms, comprising reagents for detecting gene products or expressionlevels of members of a biomarker signature in a sample obtained from apatient at a risk of sepsis, wherein the biomarker signature consists of266 genes of Table I, wherein the reagents comprise a microarray withimmobilized probes, the probes consisting of probes each suitable forspecific binding to and detection of gene products of at least 25 of themembers of the biomarker signature, and wherein the at least 25 of themembers of the biomarker signature comprise C16ORF7, C5ORF39, C9ORF103,CD177, FCER1A, GAS7, LOC285176, MYBPC3, NDST2, EBI2, RPL13A, RPL18A,RPL32, RPL36, RPL9, RPS20, RPS29, RPS6, SIGIRR, TCEA3, TCTN1, TIMM9,TOMM7, ZFAND1 and ZNHIT3.
 10. The kit of claim 9, wherein the geneproducts are nucleic acids.
 11. The kit of claim 10, wherein the nucleicacids are transcribed ribonucleic acids or cDNA.
 12. The kit of claim 9,wherein the gene products are proteins.
 13. The kit of claim 9, whereinthe patient is a post-surgical patient, an immunocompromised individual,an intensive-care patient or a burn patient.
 14. The kit of claim 9,wherein the at least 25 of the members of the biomarker signaturefurther comprise ATP9A, CACNA1E, DHRS3, EEF1B2, FLT3LG, GRB10, HLA.DMA,HS.445036, IL1R1, IL1R2, NCOA3, RPL10A, LOC646483, RPL18, SLBP, SLC26A6,SMPDL3A, SORBS3, THBS3 and THNSL1.
 15. The kit of claim 9, wherein theat least 25 of the members of the biomarker signature consist of all 266genes of Table
 1. 16. A system for analysis of a biological sampleobtained from a patient at risk of developing sepsis to predict ormonitor development of sepsis in the patient, the system comprising: adetector for monitoring, measuring or detecting the gene products or theexpression levels of the member of a biomarker signature; the kit ofclaim 9; and a computer processor configured to analyze data produced bythe detector, and to provide an output.